Rough K-modes Clustering Algorithm Based on Entropy

نویسندگان

  • Qi Duan
  • You Long Yang
  • Yang Li
چکیده

Cluster analysis is an important technique used in data mining. Categorical data clustering has received a great deal of attention in recent years. Some existing algorithms for clustering categorical data do not consider the importance of attributes for clustering, thereby reducing the efficiency of clustering analysis and limiting its application. In this paper, we propose a novel rough k-modes clustering algorithm based on entropy. First, we integrated the knowledge of information entropy to define a new dissimilarity measure that takes into account the importance of attributes for clustering and improves the quality of clustering. Then, applying the theory of rough set analysis, we used upper and lower approximation to deal with uncertain clusters, which allowed us to offer an improved solution for uncertainty analysis. Finally, our experimental results demonstrated that our proposed algorithm performed better than other conventional clustering algorithms in terms of clustering accuracy, purity, and F1-measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

A dissimilarity measure for the k-Modes clustering algorithm

Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes...

متن کامل

Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of thi...

متن کامل

Use of the Improved Frog-Leaping Algorithm in Data Clustering

Clustering is one of the known techniques in the field of data mining where data with similar properties is within the set of categories. K-means algorithm is one the simplest clustering algorithms which have disadvantages sensitive to initial values of the clusters and converging to the local optimum. In recent years, several algorithms are provided based on evolutionary algorithms for cluster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016